{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run a model with config files\n",
    "\n",
    "FuxiCTR version: v1.0\n",
    "\n",
    "This tutorial shows how to use YAML config files to define dataset and model hyper-parameters, and then run the model. \n",
    "\n",
    "We take `DeepFM_with_config.py` in the demo directory as an example. The config files are located in [demo/demo_config](https://github.com/xue-pai/FuxiCTR/tree/main/demo/demo_config) folder.\n",
    "\n",
    "The dataset config `dataset_config.yaml` is as follows:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "### Tiny data for tests only\n",
    "taobao_tiny_data:\n",
    "    data_root: ../data/\n",
    "    data_format: csv\n",
    "    train_data: ../data/tiny_data/train_sample.csv\n",
    "    valid_data: ../data/tiny_data/valid_sample.csv\n",
    "    test_data: ../data/tiny_data/test_sample.csv\n",
    "    min_categr_count: 1\n",
    "    feature_cols:\n",
    "        - {name: [\"userid\",\"adgroup_id\",\"pid\",\"cate_id\",\"campaign_id\",\"customer\",\"brand\",\"cms_segid\",\n",
    "                  \"cms_group_id\",\"final_gender_code\",\"age_level\",\"pvalue_level\",\"shopping_level\",\"occupation\"], \n",
    "                  active: True, dtype: str, type: categorical}\n",
    "    label_col: {name: clk, dtype: float}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model config `model_config.yaml` is as follows. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The `Base` can be shared by different expid settings\n",
    "Base: \n",
    "    model_root: '../checkpoints/'\n",
    "    workers: 3\n",
    "    verbose: 1\n",
    "    patience: 2\n",
    "    pickle_feature_encoder: True\n",
    "    use_hdf5: True\n",
    "    save_best_only: True\n",
    "    every_x_epochs: 1\n",
    "    debug: False\n",
    "\n",
    "# The expid should be unique among all settings\n",
    "DeepFM_test:\n",
    "    model: DeepFM\n",
    "    dataset_id: taobao_tiny_data # each expid corresponds to a dataset_id\n",
    "    loss: 'binary_crossentropy'\n",
    "    metrics: ['logloss', 'AUC']\n",
    "    task: binary_classification\n",
    "    optimizer: adam\n",
    "    hidden_units: [64, 32]\n",
    "    hidden_activations: relu\n",
    "    net_regularizer: 0\n",
    "    embedding_regularizer: 1.e-8\n",
    "    learning_rate: 1.e-3\n",
    "    batch_norm: False\n",
    "    net_dropout: 0\n",
    "    batch_size: 128\n",
    "    embedding_dim: 4\n",
    "    epochs: 1\n",
    "    shuffle: True\n",
    "    seed: 2019\n",
    "    monitor: 'AUC'\n",
    "    monitor_mode: 'max'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the `Base` to keep some common hyper-paramerters that could be shared by different expid settings. It is also flexible to merge all the key-values pairs in `Base` to `FM_test` for your convenience.\n",
    "\n",
    "Note that the naming `dataset_config` and `model_config` should keep unchanged. Both dataset config and model config should be kept in the same directory: either 1) put dataset_config.yaml and model_config.yaml as shown in [./demo/demo_config](https://github.com/xue-pai/FuxiCTR/tree/main/demo/demo_config), or 2) put in dataset_config and model_config folders as shown in [./config](https://github.com/xue-pai/FuxiCTR/tree/main/config) when a bunch of config files are available. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "from fuxictr.datasets import data_generator\n",
    "from fuxictr.datasets.taobao import FeatureEncoder\n",
    "from datetime import datetime\n",
    "from fuxictr.utils import set_logger, print_to_json, load_config\n",
    "import logging\n",
    "from fuxictr.pytorch.models import DeepFM\n",
    "from fuxictr.pytorch.utils import seed_everything\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    # Load params from config files\n",
    "    config_dir = 'demo_config'\n",
    "    experiment_id = 'DeepFM_test'\n",
    "    params = load_config(config_dir, experiment_id)\n",
    "\n",
    "    # set up logger and random seed\n",
    "    set_logger(params)\n",
    "    logging.info(print_to_json(params))\n",
    "    seed_everything(seed=params['seed'])\n",
    "\n",
    "    # Set up feature encoder\n",
    "    feature_encoder = FeatureEncoder(**params)\n",
    "    feature_encoder.fit(train_data=params['train_data'], \n",
    "                        min_categr_count=params['min_categr_count'])\n",
    "\n",
    "    # Build train/validation/test data generators\n",
    "    train_gen, valid_gen, test_gen = data_generator(feature_encoder,\n",
    "                                                    train_data=params['train_data'],\n",
    "                                                    valid_data=params['valid_data'],\n",
    "                                                    test_data=params['test_data'],\n",
    "                                                    batch_size=params['batch_size'],\n",
    "                                                    shuffle=params['shuffle'],\n",
    "                                                    use_hdf5=params['use_hdf5'])\n",
    "    # Build a DeepFM model\n",
    "    model = DeepFM(feature_encoder.feature_map, **params)\n",
    "    model.fit_generator(train_gen, validation_data=valid_gen, epochs=params['epochs'],\n",
    "                        verbose=params['verbose'])\n",
    "   \n",
    "    # Reloading weights of the best checkpoint\n",
    "    model.load_weights(model.checkpoint)\n",
    "\n",
    "    # Evalution on validation\n",
    "    logging.info('***** validation results *****')\n",
    "    model.evaluate_generator(valid_gen)\n",
    "\n",
    "    # Evalution on test\n",
    "    logging.info('***** test results *****')\n",
    "    model.evaluate_generator(test_gen)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For easy use, we also provide a useful tool script `run_expid.py` to run FuxiCTR models based on YAML config files. \n",
    "\n",
    "+ --config: The config directory of data and model config files.\n",
    "+ --expid: The given expid that denotes the detailed experimental settings.\n",
    "+ --gpu: The gpu index used for experiment, and -1 for CPU.\n",
    "\n",
    "Try the following examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cd benchmarks\n",
    "# run the demo config\n",
    "!python run_expid.py --config ../demo/demo_config --expid DeepFM_test --gpu 0\n",
    "# run DeepFM_test, located in config/model_config/tests.yaml\n",
    "!python run_expid.py --config ../config --expid DeepFM_test --gpu 0"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
